Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 4600 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 503.2 KiB |
| Average record size in memory | 112.0 B |
Variable types
| Numeric | 11 |
|---|---|
| Categorical | 3 |
price is highly correlated with sqft_living and 1 other fields | High correlation |
bedrooms is highly correlated with bathrooms and 1 other fields | High correlation |
bathrooms is highly correlated with bedrooms and 4 other fields | High correlation |
sqft_living is highly correlated with price and 3 other fields | High correlation |
floors is highly correlated with yr_built | High correlation |
sqft_above is highly correlated with price and 4 other fields | High correlation |
yr_built is highly correlated with bathrooms and 3 other fields | High correlation |
condition is highly correlated with yr_built | High correlation |
sqft_basement is highly correlated with bathrooms and 2 other fields | High correlation |
yr_renovated is highly correlated with yr_built | High correlation |
price is highly skewed (γ1 = 24.79093256) | Skewed |
price has 49 (1.1%) zeros | Zeros |
sqft_basement has 2745 (59.7%) zeros | Zeros |
yr_renovated has 2735 (59.5%) zeros | Zeros |
city has 123 (2.7%) zeros | Zeros |
Reproduction
| Analysis started | 2022-11-21 06:48:24.217965 |
|---|---|
| Analysis finished | 2022-11-21 06:48:41.505127 |
| Duration | 17.29 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 1741 |
|---|---|
| Distinct (%) | 37.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 551962.9885 |
| Minimum | 0 |
|---|---|
| Maximum | 26590000 |
| Zeros | 49 |
| Zeros (%) | 1.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 200000 |
| Q1 | 322875 |
| median | 460943.4615 |
| Q3 | 654962.5 |
| 95-th percentile | 1184050 |
| Maximum | 26590000 |
| Range | 26590000 |
| Interquartile range (IQR) | 332087.5 |
Descriptive statistics
| Standard deviation | 563834.7025 |
|---|---|
| Coefficient of variation (CV) | 1.021508171 |
| Kurtosis | 1044.352151 |
| Mean | 551962.9885 |
| Median Absolute Deviation (MAD) | 157500 |
| Skewness | 24.79093256 |
| Sum | 2539029747 |
| Variance | 3.179095718 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 49 | 1.1% |
| 300000 | 42 | 0.9% |
| 400000 | 31 | 0.7% |
| 440000 | 29 | 0.6% |
| 450000 | 29 | 0.6% |
| 600000 | 29 | 0.6% |
| 350000 | 28 | 0.6% |
| 250000 | 27 | 0.6% |
| 435000 | 27 | 0.6% |
| 415000 | 27 | 0.6% |
| Other values (1731) | 4282 |
| Value | Count | Frequency (%) |
| 0 | 49 | |
| 7800 | 1 | < 0.1% |
| 80000 | 1 | < 0.1% |
| 83000 | 1 | < 0.1% |
| 83300 | 2 | < 0.1% |
| 84350 | 1 | < 0.1% |
| 87500 | 1 | < 0.1% |
| 90000 | 2 | < 0.1% |
| 100000 | 4 | 0.1% |
| 102500 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 26590000 | 1 | |
| 12899000 | 1 | |
| 7062500 | 1 | |
| 4668000 | 1 | |
| 4489000 | 1 | |
| 3800000 | 1 | |
| 3710000 | 1 | |
| 3200000 | 1 | |
| 3100000 | 1 | |
| 3000000 | 1 |
| Distinct | 10 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.400869565 |
| Minimum | 0 |
|---|---|
| Maximum | 9 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 3 |
| Q3 | 4 |
| 95-th percentile | 5 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.9088481155 |
|---|---|
| Coefficient of variation (CV) | 0.2672399215 |
| Kurtosis | 1.235377429 |
| Mean | 3.400869565 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.456446633 |
| Sum | 15644 |
| Variance | 0.8260048971 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) |
| 3 | 2032 | |
| 4 | 1531 | |
| 2 | 566 | 12.3% |
| 5 | 353 | 7.7% |
| 6 | 61 | 1.3% |
| 1 | 38 | 0.8% |
| 7 | 14 | 0.3% |
| 8 | 2 | < 0.1% |
| 0 | 2 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 2 | < 0.1% |
| 1 | 38 | 0.8% |
| 2 | 566 | 12.3% |
| 3 | 2032 | |
| 4 | 1531 | |
| 5 | 353 | 7.7% |
| 6 | 61 | 1.3% |
| 7 | 14 | 0.3% |
| 8 | 2 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 9 | 1 | < 0.1% |
| 8 | 2 | < 0.1% |
| 7 | 14 | 0.3% |
| 6 | 61 | 1.3% |
| 5 | 353 | 7.7% |
| 4 | 1531 | |
| 3 | 2032 | |
| 2 | 566 | 12.3% |
| 1 | 38 | 0.8% |
| 0 | 2 | < 0.1% |
| Distinct | 26 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.160815217 |
| Minimum | 0 |
|---|---|
| Maximum | 8 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1.75 |
| median | 2.25 |
| Q3 | 2.5 |
| 95-th percentile | 3.5 |
| Maximum | 8 |
| Range | 8 |
| Interquartile range (IQR) | 0.75 |
Descriptive statistics
| Standard deviation | 0.7837810747 |
|---|---|
| Coefficient of variation (CV) | 0.3627247107 |
| Kurtosis | 1.86590471 |
| Mean | 2.160815217 |
| Median Absolute Deviation (MAD) | 0.5 |
| Skewness | 0.6160327234 |
| Sum | 9939.75 |
| Variance | 0.614312773 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=26)
| Value | Count | Frequency (%) |
| 2.5 | 1189 | |
| 1 | 743 | |
| 1.75 | 629 | |
| 2 | 427 | 9.3% |
| 2.25 | 419 | 9.1% |
| 1.5 | 291 | 6.3% |
| 2.75 | 276 | 6.0% |
| 3 | 167 | 3.6% |
| 3.5 | 162 | 3.5% |
| 3.25 | 136 | 3.0% |
| Other values (16) | 161 | 3.5% |
| Value | Count | Frequency (%) |
| 0 | 2 | < 0.1% |
| 0.75 | 17 | 0.4% |
| 1 | 743 | |
| 1.25 | 3 | 0.1% |
| 1.5 | 291 | 6.3% |
| 1.75 | 629 | |
| 2 | 427 | 9.3% |
| 2.25 | 419 | 9.1% |
| 2.5 | 1189 | |
| 2.75 | 276 | 6.0% |
| Value | Count | Frequency (%) |
| 8 | 1 | < 0.1% |
| 6.75 | 1 | < 0.1% |
| 6.5 | 1 | < 0.1% |
| 6.25 | 2 | < 0.1% |
| 5.75 | 1 | < 0.1% |
| 5.5 | 4 | 0.1% |
| 5.25 | 4 | 0.1% |
| 5 | 6 | 0.1% |
| 4.75 | 7 | 0.2% |
| 4.5 | 29 |
| Distinct | 566 |
|---|---|
| Distinct (%) | 12.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2139.346957 |
| Minimum | 370 |
|---|---|
| Maximum | 13540 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 370 |
|---|---|
| 5-th percentile | 950 |
| Q1 | 1460 |
| median | 1980 |
| Q3 | 2620 |
| 95-th percentile | 3870 |
| Maximum | 13540 |
| Range | 13170 |
| Interquartile range (IQR) | 1160 |
Descriptive statistics
| Standard deviation | 963.2069158 |
|---|---|
| Coefficient of variation (CV) | 0.4502340833 |
| Kurtosis | 8.2916826 |
| Mean | 2139.346957 |
| Median Absolute Deviation (MAD) | 570 |
| Skewness | 1.723513271 |
| Sum | 9840996 |
| Variance | 927767.5626 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1940 | 32 | 0.7% |
| 1720 | 32 | 0.7% |
| 1660 | 31 | 0.7% |
| 1840 | 31 | 0.7% |
| 2000 | 30 | 0.7% |
| 1410 | 29 | 0.6% |
| 1200 | 28 | 0.6% |
| 1480 | 28 | 0.6% |
| 1700 | 27 | 0.6% |
| 1490 | 27 | 0.6% |
| Other values (556) | 4305 |
| Value | Count | Frequency (%) |
| 370 | 1 | |
| 380 | 1 | |
| 420 | 1 | |
| 430 | 1 | |
| 490 | 1 | |
| 520 | 1 | |
| 550 | 1 | |
| 560 | 1 | |
| 580 | 1 | |
| 590 | 2 |
| Value | Count | Frequency (%) |
| 13540 | 1 | |
| 10040 | 1 | |
| 9640 | 1 | |
| 8670 | 1 | |
| 8020 | 1 | |
| 7320 | 1 | |
| 7270 | 1 | |
| 7050 | 1 | |
| 6980 | 1 | |
| 6900 | 1 |
sqft_lot
Real number (ℝ≥0)
| Distinct | 3113 |
|---|---|
| Distinct (%) | 67.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14852.51609 |
| Minimum | 638 |
|---|---|
| Maximum | 1074218 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 638 |
|---|---|
| 5-th percentile | 1690.8 |
| Q1 | 5000.75 |
| median | 7683 |
| Q3 | 11001.25 |
| 95-th percentile | 43560 |
| Maximum | 1074218 |
| Range | 1073580 |
| Interquartile range (IQR) | 6000.5 |
Descriptive statistics
| Standard deviation | 35884.43614 |
|---|---|
| Coefficient of variation (CV) | 2.416050987 |
| Kurtosis | 219.8729874 |
| Mean | 14852.51609 |
| Median Absolute Deviation (MAD) | 2772 |
| Skewness | 11.30713875 |
| Sum | 68321574 |
| Variance | 1287692757 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 5000 | 80 | 1.7% |
| 6000 | 65 | 1.4% |
| 4000 | 54 | 1.2% |
| 7200 | 50 | 1.1% |
| 4800 | 29 | 0.6% |
| 4500 | 25 | 0.5% |
| 9600 | 25 | 0.5% |
| 3000 | 23 | 0.5% |
| 5500 | 23 | 0.5% |
| 7500 | 23 | 0.5% |
| Other values (3103) | 4203 |
| Value | Count | Frequency (%) |
| 638 | 1 | |
| 681 | 1 | |
| 704 | 1 | |
| 746 | 1 | |
| 747 | 1 | |
| 750 | 1 | |
| 779 | 1 | |
| 833 | 1 | |
| 835 | 1 | |
| 844 | 2 |
| Value | Count | Frequency (%) |
| 1074218 | 1 | |
| 641203 | 1 | |
| 478288 | 1 | |
| 435600 | 2 | |
| 423838 | 1 | |
| 389126 | 1 | |
| 327135 | 1 | |
| 307752 | 1 | |
| 306848 | 1 | |
| 284011 | 1 |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.512065217 |
| Minimum | 1 |
|---|---|
| Maximum | 3.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1.5 |
| Q3 | 2 |
| 95-th percentile | 2 |
| Maximum | 3.5 |
| Range | 2.5 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.5382883773 |
|---|---|
| Coefficient of variation (CV) | 0.3559954763 |
| Kurtosis | -0.5388519795 |
| Mean | 1.512065217 |
| Median Absolute Deviation (MAD) | 0.5 |
| Skewness | 0.5514406463 |
| Sum | 6955.5 |
| Variance | 0.2897543771 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 1 | 2174 | |
| 2 | 1811 | |
| 1.5 | 444 | 9.7% |
| 3 | 128 | 2.8% |
| 2.5 | 41 | 0.9% |
| 3.5 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 2174 | |
| 1.5 | 444 | 9.7% |
| 2 | 1811 | |
| 2.5 | 41 | 0.9% |
| 3 | 128 | 2.8% |
| 3.5 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 3.5 | 2 | < 0.1% |
| 3 | 128 | 2.8% |
| 2.5 | 41 | 0.9% |
| 2 | 1811 | |
| 1.5 | 444 | 9.7% |
| 1 | 2174 |
waterfront
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 36.1 KiB |
| 0 | |
|---|---|
| 1 | 33 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 4600 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 4567 | |
| 1 | 33 | 0.7% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 4567 | |
| 1 | 33 | 0.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 4567 | |
| 1 | 33 | 0.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 4600 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 4567 | |
| 1 | 33 | 0.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 4600 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 4567 | |
| 1 | 33 | 0.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4600 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 4567 | |
| 1 | 33 | 0.7% |
view
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 36.1 KiB |
| 0 | |
|---|---|
| 2 | 205 |
| 3 | 116 |
| 4 | 70 |
| 1 | 69 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 4600 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 4 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 4140 | |
| 2 | 205 | 4.5% |
| 3 | 116 | 2.5% |
| 4 | 70 | 1.5% |
| 1 | 69 | 1.5% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 4140 | |
| 2 | 205 | 4.5% |
| 3 | 116 | 2.5% |
| 4 | 70 | 1.5% |
| 1 | 69 | 1.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 4140 | |
| 2 | 205 | 4.5% |
| 3 | 116 | 2.5% |
| 4 | 70 | 1.5% |
| 1 | 69 | 1.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 4600 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 4140 | |
| 2 | 205 | 4.5% |
| 3 | 116 | 2.5% |
| 4 | 70 | 1.5% |
| 1 | 69 | 1.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 4600 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 4140 | |
| 2 | 205 | 4.5% |
| 3 | 116 | 2.5% |
| 4 | 70 | 1.5% |
| 1 | 69 | 1.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4600 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 4140 | |
| 2 | 205 | 4.5% |
| 3 | 116 | 2.5% |
| 4 | 70 | 1.5% |
| 1 | 69 | 1.5% |
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 36.1 KiB |
| 3 | |
|---|---|
| 4 | |
| 5 | |
| 2 | 32 |
| 1 | 6 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 4600 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 3 |
|---|---|
| 2nd row | 5 |
| 3rd row | 4 |
| 4th row | 4 |
| 5th row | 4 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 2875 | |
| 4 | 1252 | |
| 5 | 435 | 9.5% |
| 2 | 32 | 0.7% |
| 1 | 6 | 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 3 | 2875 | |
| 4 | 1252 | |
| 5 | 435 | 9.5% |
| 2 | 32 | 0.7% |
| 1 | 6 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 2875 | |
| 4 | 1252 | |
| 5 | 435 | 9.5% |
| 2 | 32 | 0.7% |
| 1 | 6 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 4600 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 2875 | |
| 4 | 1252 | |
| 5 | 435 | 9.5% |
| 2 | 32 | 0.7% |
| 1 | 6 | 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 4600 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 2875 | |
| 4 | 1252 | |
| 5 | 435 | 9.5% |
| 2 | 32 | 0.7% |
| 1 | 6 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4600 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 2875 | |
| 4 | 1252 | |
| 5 | 435 | 9.5% |
| 2 | 32 | 0.7% |
| 1 | 6 | 0.1% |
| Distinct | 511 |
|---|---|
| Distinct (%) | 11.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1827.265435 |
| Minimum | 370 |
|---|---|
| Maximum | 9410 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 370 |
|---|---|
| 5-th percentile | 860 |
| Q1 | 1190 |
| median | 1590 |
| Q3 | 2300 |
| 95-th percentile | 3440 |
| Maximum | 9410 |
| Range | 9040 |
| Interquartile range (IQR) | 1110 |
Descriptive statistics
| Standard deviation | 862.168977 |
|---|---|
| Coefficient of variation (CV) | 0.4718356515 |
| Kurtosis | 4.070138265 |
| Mean | 1827.265435 |
| Median Absolute Deviation (MAD) | 490 |
| Skewness | 1.494210748 |
| Sum | 8405421 |
| Variance | 743335.3448 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1200 | 47 | 1.0% |
| 1010 | 47 | 1.0% |
| 1300 | 45 | 1.0% |
| 1140 | 44 | 1.0% |
| 1320 | 43 | 0.9% |
| 1150 | 42 | 0.9% |
| 1090 | 40 | 0.9% |
| 1180 | 40 | 0.9% |
| 1400 | 38 | 0.8% |
| 1050 | 37 | 0.8% |
| Other values (501) | 4177 |
| Value | Count | Frequency (%) |
| 370 | 1 | < 0.1% |
| 380 | 1 | < 0.1% |
| 420 | 1 | < 0.1% |
| 430 | 1 | < 0.1% |
| 490 | 1 | < 0.1% |
| 520 | 1 | < 0.1% |
| 550 | 3 | |
| 560 | 1 | < 0.1% |
| 580 | 1 | < 0.1% |
| 590 | 2 |
| Value | Count | Frequency (%) |
| 9410 | 1 | |
| 8020 | 1 | |
| 7680 | 1 | |
| 7320 | 1 | |
| 6640 | 1 | |
| 6430 | 1 | |
| 6420 | 1 | |
| 6120 | 1 | |
| 6070 | 1 | |
| 6050 | 1 |
| Distinct | 207 |
|---|---|
| Distinct (%) | 4.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 312.0815217 |
| Minimum | 0 |
|---|---|
| Maximum | 4820 |
| Zeros | 2745 |
| Zeros (%) | 59.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 610 |
| 95-th percentile | 1210 |
| Maximum | 4820 |
| Range | 4820 |
| Interquartile range (IQR) | 610 |
Descriptive statistics
| Standard deviation | 464.1372281 |
|---|---|
| Coefficient of variation (CV) | 1.487230726 |
| Kurtosis | 4.082380024 |
| Mean | 312.0815217 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.642732192 |
| Sum | 1435575 |
| Variance | 215423.3665 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2745 | |
| 500 | 53 | 1.2% |
| 600 | 45 | 1.0% |
| 800 | 43 | 0.9% |
| 900 | 41 | 0.9% |
| 700 | 38 | 0.8% |
| 1000 | 33 | 0.7% |
| 400 | 33 | 0.7% |
| 550 | 27 | 0.6% |
| 750 | 26 | 0.6% |
| Other values (197) | 1516 |
| Value | Count | Frequency (%) |
| 0 | 2745 | |
| 20 | 1 | < 0.1% |
| 50 | 1 | < 0.1% |
| 60 | 2 | < 0.1% |
| 65 | 1 | < 0.1% |
| 70 | 1 | < 0.1% |
| 80 | 3 | 0.1% |
| 90 | 2 | < 0.1% |
| 100 | 14 | 0.3% |
| 110 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 4820 | 1 | |
| 4130 | 1 | |
| 2850 | 1 | |
| 2730 | 1 | |
| 2550 | 2 | |
| 2360 | 1 | |
| 2330 | 1 | |
| 2300 | 1 | |
| 2200 | 1 | |
| 2180 | 1 |
| Distinct | 115 |
|---|---|
| Distinct (%) | 2.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1970.786304 |
| Minimum | 1900 |
|---|---|
| Maximum | 2014 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 1900 |
|---|---|
| 5-th percentile | 1913 |
| Q1 | 1951 |
| median | 1976 |
| Q3 | 1997 |
| 95-th percentile | 2009 |
| Maximum | 2014 |
| Range | 114 |
| Interquartile range (IQR) | 46 |
Descriptive statistics
| Standard deviation | 29.73184839 |
|---|---|
| Coefficient of variation (CV) | 0.0150862873 |
| Kurtosis | -0.6700759004 |
| Mean | 1970.786304 |
| Median Absolute Deviation (MAD) | 23 |
| Skewness | -0.50215519 |
| Sum | 9065617 |
| Variance | 883.9828087 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2006 | 111 | 2.4% |
| 2005 | 104 | 2.3% |
| 2007 | 93 | 2.0% |
| 2004 | 92 | 2.0% |
| 1978 | 90 | 2.0% |
| 2003 | 89 | 1.9% |
| 2008 | 89 | 1.9% |
| 1967 | 82 | 1.8% |
| 1977 | 80 | 1.7% |
| 2014 | 78 | 1.7% |
| Other values (105) | 3692 |
| Value | Count | Frequency (%) |
| 1900 | 22 | |
| 1901 | 9 | 0.2% |
| 1902 | 10 | 0.2% |
| 1903 | 10 | 0.2% |
| 1904 | 9 | 0.2% |
| 1905 | 19 | |
| 1906 | 27 | |
| 1907 | 12 | |
| 1908 | 19 | |
| 1909 | 22 |
| Value | Count | Frequency (%) |
| 2014 | 78 | |
| 2013 | 57 | |
| 2012 | 33 | 0.7% |
| 2011 | 24 | 0.5% |
| 2010 | 28 | 0.6% |
| 2009 | 50 | |
| 2008 | 89 | |
| 2007 | 93 | |
| 2006 | 111 | |
| 2005 | 104 |
| Distinct | 60 |
|---|---|
| Distinct (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 808.6082609 |
| Minimum | 0 |
|---|---|
| Maximum | 2014 |
| Zeros | 2735 |
| Zeros (%) | 59.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1999 |
| 95-th percentile | 2011 |
| Maximum | 2014 |
| Range | 2014 |
| Interquartile range (IQR) | 1999 |
Descriptive statistics
| Standard deviation | 979.4145364 |
|---|---|
| Coefficient of variation (CV) | 1.211234888 |
| Kurtosis | -1.851110913 |
| Mean | 808.6082609 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.3859187009 |
| Sum | 3719598 |
| Variance | 959252.8341 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 2735 | |
| 2000 | 170 | 3.7% |
| 2003 | 151 | 3.3% |
| 2009 | 109 | 2.4% |
| 2001 | 109 | 2.4% |
| 2005 | 95 | 2.1% |
| 2004 | 77 | 1.7% |
| 2014 | 72 | 1.6% |
| 2006 | 68 | 1.5% |
| 2013 | 61 | 1.3% |
| Other values (50) | 953 | 20.7% |
| Value | Count | Frequency (%) |
| 0 | 2735 | |
| 1912 | 33 | 0.7% |
| 1913 | 1 | < 0.1% |
| 1923 | 57 | 1.2% |
| 1934 | 6 | 0.1% |
| 1945 | 7 | 0.2% |
| 1948 | 1 | < 0.1% |
| 1953 | 1 | < 0.1% |
| 1954 | 8 | 0.2% |
| 1955 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 2014 | 72 | |
| 2013 | 61 | |
| 2012 | 45 | |
| 2011 | 54 | |
| 2010 | 30 | 0.7% |
| 2009 | 109 | |
| 2008 | 45 | |
| 2007 | 7 | 0.2% |
| 2006 | 68 | |
| 2005 | 95 |
| Distinct | 44 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.546521739 |
| Minimum | 0 |
|---|---|
| Maximum | 43 |
| Zeros | 123 |
| Zeros (%) | 2.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 36.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 4 |
| Q3 | 14 |
| 95-th percentile | 27 |
| Maximum | 43 |
| Range | 43 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 9.162660506 |
|---|---|
| Coefficient of variation (CV) | 1.072092342 |
| Kurtosis | 0.8150428561 |
| Mean | 8.546521739 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 1.21114425 |
| Sum | 39314 |
| Variance | 83.95434754 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=44)
| Value | Count | Frequency (%) |
| 1 | 1573 | |
| 18 | 293 | 6.4% |
| 3 | 286 | 6.2% |
| 4 | 235 | 5.1% |
| 14 | 187 | 4.1% |
| 13 | 187 | 4.1% |
| 2 | 185 | 4.0% |
| 9 | 176 | 3.8% |
| 8 | 175 | 3.8% |
| 12 | 148 | 3.2% |
| Other values (34) | 1155 |
| Value | Count | Frequency (%) |
| 0 | 123 | 2.7% |
| 1 | 1573 | |
| 2 | 185 | 4.0% |
| 3 | 286 | 6.2% |
| 4 | 235 | 5.1% |
| 5 | 96 | 2.1% |
| 6 | 50 | 1.1% |
| 7 | 36 | 0.8% |
| 8 | 175 | 3.8% |
| 9 | 176 | 3.8% |
| Value | Count | Frequency (%) |
| 43 | 2 | < 0.1% |
| 42 | 2 | < 0.1% |
| 41 | 1 | < 0.1% |
| 40 | 6 | 0.1% |
| 39 | 1 | < 0.1% |
| 38 | 28 | |
| 37 | 11 | 0.2% |
| 36 | 29 | |
| 35 | 4 | 0.1% |
| 34 | 29 |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| price | bedrooms | bathrooms | sqft_living | sqft_lot | floors | waterfront | view | condition | sqft_above | sqft_basement | yr_built | yr_renovated | city | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 313000.0 | 3.0 | 1.50 | 1340 | 7912 | 1.5 | 0 | 0 | 3 | 1340 | 0 | 1955 | 2005 | 0 |
| 1 | 2384000.0 | 5.0 | 2.50 | 3650 | 9050 | 2.0 | 0 | 4 | 5 | 3370 | 280 | 1921 | 0 | 1 |
| 2 | 342000.0 | 3.0 | 2.00 | 1930 | 11947 | 1.0 | 0 | 0 | 4 | 1930 | 0 | 1966 | 0 | 2 |
| 3 | 420000.0 | 3.0 | 2.25 | 2000 | 8030 | 1.0 | 0 | 0 | 4 | 1000 | 1000 | 1963 | 0 | 3 |
| 4 | 550000.0 | 4.0 | 2.50 | 1940 | 10500 | 1.0 | 0 | 0 | 4 | 1140 | 800 | 1976 | 1992 | 4 |
| 5 | 490000.0 | 2.0 | 1.00 | 880 | 6380 | 1.0 | 0 | 0 | 3 | 880 | 0 | 1938 | 1994 | 1 |
| 6 | 335000.0 | 2.0 | 2.00 | 1350 | 2560 | 1.0 | 0 | 0 | 3 | 1350 | 0 | 1976 | 0 | 4 |
| 7 | 482000.0 | 4.0 | 2.50 | 2710 | 35868 | 2.0 | 0 | 0 | 3 | 2710 | 0 | 1989 | 0 | 5 |
| 8 | 452500.0 | 3.0 | 2.50 | 2430 | 88426 | 1.0 | 0 | 0 | 4 | 1570 | 860 | 1985 | 0 | 6 |
| 9 | 640000.0 | 4.0 | 2.00 | 1520 | 6200 | 1.5 | 0 | 0 | 3 | 1520 | 0 | 1945 | 2010 | 1 |
Last rows
| price | bedrooms | bathrooms | sqft_living | sqft_lot | floors | waterfront | view | condition | sqft_above | sqft_basement | yr_built | yr_renovated | city | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4590 | 380680.555556 | 4.0 | 2.50 | 2620 | 8331 | 2.0 | 0 | 0 | 3 | 2620 | 0 | 1991 | 0 | 18 |
| 4591 | 396166.666667 | 3.0 | 1.75 | 1880 | 5752 | 1.0 | 0 | 0 | 4 | 940 | 940 | 1945 | 0 | 1 |
| 4592 | 252980.000000 | 4.0 | 2.50 | 2530 | 8169 | 2.0 | 0 | 0 | 3 | 2530 | 0 | 1993 | 0 | 12 |
| 4593 | 289373.307692 | 3.0 | 2.50 | 2538 | 4600 | 2.0 | 0 | 0 | 3 | 2538 | 0 | 2013 | 1923 | 9 |
| 4594 | 210614.285714 | 3.0 | 2.50 | 1610 | 7223 | 2.0 | 0 | 0 | 3 | 1610 | 0 | 1994 | 0 | 2 |
| 4595 | 308166.666667 | 3.0 | 1.75 | 1510 | 6360 | 1.0 | 0 | 0 | 4 | 1510 | 0 | 1954 | 1979 | 1 |
| 4596 | 534333.333333 | 3.0 | 2.50 | 1460 | 7573 | 2.0 | 0 | 0 | 3 | 1460 | 0 | 1983 | 2009 | 3 |
| 4597 | 416904.166667 | 3.0 | 2.50 | 3010 | 7014 | 2.0 | 0 | 0 | 3 | 3010 | 0 | 2009 | 0 | 18 |
| 4598 | 203400.000000 | 4.0 | 2.00 | 2090 | 6630 | 1.0 | 0 | 0 | 3 | 1070 | 1020 | 1974 | 0 | 1 |
| 4599 | 220600.000000 | 3.0 | 2.50 | 1490 | 8102 | 2.0 | 0 | 0 | 4 | 1490 | 0 | 1990 | 0 | 23 |